27 research outputs found

    JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

    Full text link
    Schema discovery is an important aspect to working with data in formats such as JSON. Unlike relational databases, JSON data sets often do not have associated structural information. Consumers of such datasets are often left to browse through data in an attempt to observe commonalities in structure across documents to construct suitable code for data processing. However, this process is time-consuming and error-prone. Existing distributed approaches to mining schemas present a significant usability advantage as they provide useful metadata for large data sources. However, depending on the data source, ad hoc queries for estimating other properties to help with crafting an efficient data pipeline can be expensive. We propose JSONoid, a distributed schema discovery process augmented with additional metadata in the form of monoid data structures that are easily maintainable in a distributed setting. JSONoid subsumes several existing approaches to distributed schema discovery with similar performance. Our approach also adds significant useful additional information about data values to discovered schemas with linear scalability

    Comprehending Semantic Types in JSON Data with Graph Neural Networks

    Full text link
    Semantic types are a more powerful and detailed way of describing data than atomic types such as strings or integers. They establish connections between columns and concepts from the real world, providing more nuanced and fine-grained information that can be useful for tasks such as automated data cleaning, schema matching, and data discovery. Existing deep learning models trained on large text corpora have been successful at performing single-column semantic type prediction for relational data. However, in this work, we propose an extension of the semantic type prediction problem to JSON data, labeling the types based on JSON Paths. Similar to columns in relational data, JSON Path is a query language that enables the navigation of complex JSON data structures by specifying the location and content of the elements. We use a graph neural network to comprehend the structural information within collections of JSON documents. Our model outperforms a state-of-the-art existing model in several cases. These results demonstrate the ability of our model to understand complex JSON data and its potential usage for JSON-related data processing tasks

    Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

    Get PDF
    Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

    The attrition rate of licensed chiropractors in California: an exploratory ecological investigation of time-trend data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The authors hypothesized the attrition rate of licensed chiropractors in California has gradually increased over the past several decades. "Attrition" as determined for this study is defined as a loss of legal authority to practice chiropractic for any reason during the first 10 years after the license was issued. The percentage of license attrition after 10 years was determined for each group of graduates licensed in California each year between 1970 and 1998. The cost of tuition, the increase in the supply of licensed chiropractors and the ratio of licensed chiropractors to California residents were examined as possible influences on the rate of license attrition.</p> <p>Methods</p> <p>The attrition rate was determined by a retrospective analysis of license status data obtained from the California Department of Consumer Affairs. Other variables were determined from US Bureau of Census data, survey data from the American Chiropractic Association and catalogs from a US chiropractic college.</p> <p>Results</p> <p>The 10-year attrition rate rose from 10% for those graduates licensed in 1970 to a peak of 27.8% in 1991. The 10-year attrition rate has since remained between 20-25% for the doctors licensed between 1992-1998.</p> <p>Conclusions</p> <p>Available evidence supports the hypothesis that the attrition rate for licensed chiropractors in the first 10 years of practice has risen in the past several decades.</p

    A united statement of the global chiropractic research community against the pseudoscientific claim that chiropractic care boosts immunity.

    Get PDF
    BACKGROUND: In the midst of the coronavirus pandemic, the International Chiropractors Association (ICA) posted reports claiming that chiropractic care can impact the immune system. These claims clash with recommendations from the World Health Organization and World Federation of Chiropractic. We discuss the scientific validity of the claims made in these ICA reports. MAIN BODY: We reviewed the two reports posted by the ICA on their website on March 20 and March 28, 2020. We explored the method used to develop the claim that chiropractic adjustments impact the immune system and discuss the scientific merit of that claim. We provide a response to the ICA reports and explain why this claim lacks scientific credibility and is dangerous to the public. More than 150 researchers from 11 countries reviewed and endorsed our response. CONCLUSION: In their reports, the ICA provided no valid clinical scientific evidence that chiropractic care can impact the immune system. We call on regulatory authorities and professional leaders to take robust political and regulatory action against those claiming that chiropractic adjustments have a clinical impact on the immune system

    Tropical Data: Approach and Methodology as Applied to Trachoma Prevalence Surveys

    Get PDF
    PURPOSE: Population-based prevalence surveys are essential for decision-making on interventions to achieve trachoma elimination as a public health problem. This paper outlines the methodologies of Tropical Data, which supports work to undertake those surveys. METHODS: Tropical Data is a consortium of partners that supports health ministries worldwide to conduct globally standardised prevalence surveys that conform to World Health Organization recommendations. Founding principles are health ministry ownership, partnership and collaboration, and quality assurance and quality control at every step of the survey process. Support covers survey planning, survey design, training, electronic data collection and fieldwork, and data management, analysis and dissemination. Methods are adapted to meet local context and needs. Customisations, operational research and integration of other diseases into routine trachoma surveys have also been supported. RESULTS: Between 29th February 2016 and 24th April 2023, 3373 trachoma surveys across 50 countries have been supported, resulting in 10,818,502 people being examined for trachoma. CONCLUSION: This health ministry-led, standardised approach, with support from the start to the end of the survey process, has helped all trachoma elimination stakeholders to know where interventions are needed, where interventions can be stopped, and when elimination as a public health problem has been achieved. Flexibility to meet specific country contexts, adaptation to changes in global guidance and adjustments in response to user feedback have facilitated innovation in evidence-based methodologies, and supported health ministries to strive for global disease control targets

    Learning from Uncurated Regular Expressions

    Full text link
    Significant work has been done on learning regular expressions from a set of data values. Depending on the domain, this approach can be very successful. However, significant time is required to learn these expressions and the resulting expressions can either become very complex or inaccurate in the presence of dirty data. The alternative of manually writing regular expressions becomes unattractive when faced with a large number of values which must be matched. As an alternative, we propose learning from a large corpus of manually authored, but uncurated regular expressions mined from a public repository. The advantage of this approach is that we are able to extract salient features from a set of strings with limited overhead to feature engineering. Since the set of regular expressions covers a wide range of application domains, we expect them to widely applicable. To demonstrate the potential effectiveness of our approach, we train a model using the extracted corpus of regular expressions for the class of semantic type classification. While our approach generally yields results that are inferior to the state of the art, our training data is much smaller and simpler and a closer analysis of the performance results suggests this approach holds significant promise. We also demonstrate the possibility of using uncurated regular expressions for unsupervised learning

    Promoting the use of self-management in patients with spine pain managed by chiropractors and chiropractic interns: barriers and design of a theory-based knowledge translation intervention

    Get PDF
    Background: The literature supports the effectiveness of self-management support (SMS) to improve health outcomes of patients with chronic spine pain. However, patient engagement in SMS programs is suboptimal. The objectives of this study were to: 1) assess participation in self-care (i.e. activation) among patients with spine pain, 2) identify patients’ barriers and enablers to using SMS, and 3) map behaviour change techniques (BCTs) to key barriers to inform the design of a knowledge translation (KT) intervention aimed to increase the use of SMS. Methods: In summer 2016, we invited 250 patients with spine pain seeking care at the Canadian Memorial Chiropractic College in Ontario, Canada to complete the Patient Activation Measure (PAM) survey to assess the level of participation in self-care. We subsequently conducted individual interviews, in summer 2017, based on the Theoretical Domains Framework (TDF) in a subset of patients to identify potential challenges to using SMS. The interview guide included 20 open-ended questions and accompanying probes. Findings were deductively analysed guided by the TDF. A panel of 7 experts mapped key barriers to BCTs, designed a KT intervention, and selected the modes of delivery. Results: Two hundred and twenty-three patients completed the PAM. Approximately 24% of respondents were not actively involved in their care. Interview findings from 13 spine pain patients suggested that the potential barriers to using SMS corresponded to four TDF domains: Environmental Context and Resources; Emotion; Memory, Attention & Decision-Making; and Behavioural Regulation. The proposed theory-based KT intervention includes paper-based educational materials, webinars and videos, summarising and demonstrating the therapeutic recommendations including exercises and other lifestyle changes. In addition, the KT intervention includes Brief Action Planning, a SMS strategy based on motivational interviewing, along with a SMART plan and reminders. Conclusions: Almost one quarter of study participants were not actively engaged in their spine care. Key barriers likely to influence uptake of SMS among patients were identified and used to inform the design of a theory-based KT intervention to increase their participation level. The proposed multi-component KT intervention may be an effective strategy to optimize the quality of spine pain care and improve patients’ health-outcomes.Other UBCNon UBCReviewedFacult

    Rehabilitative management of back pain in children: protocol for a mixed studies systematic review

    No full text
    Little is known about effective, efficient and acceptable management of back pain in children. A comprehensive and updated evidence synthesis can help to inform clinical practice. Objective: To inform clinical practice, we aim to conduct a systematic review of the literature and synthesise the evidence regarding effective, cost-effective and safe rehabilitation interventions for children with back pain to improve their functioning and other health outcomes. Prospero registration number: CRD42019135009
    corecore